Reproducibility issues in science, is P value really the only answer?

نویسندگان

  • Jean Gaudart
  • Laetitia Huiart
  • Paul J Milligan
  • Rodolphe Thiebaut
  • Roch Giorgi
چکیده

Johnson describes the lack of reproducibility of scientific studies, attributed, according to the author, to the low level of significance (1). We appreciate the quality of this work and its importance for the interpretation of statistical evidence. These results should be considered in statistical guidelines. Nevertheless, we would like to point out some important points not thoroughly discussed in this publication. Not publishing “nonsignificant” results leads to the well-known publication bias whereby studies with low statistical power are underrepresented. This bias would become more severe, despite recommendations to allow for publication of “negative” results. Lowering the significance level will further increase the type II error, which is clinically as important as type I error. Focusing only on the type I error may lead to an excessive false nondiscovery rate. In the case of severe diseases, it is not uncommon to fix a significance level at 0.1 (2), at the early stages, to avoid excluding an effective treatment. Johnson argues that this may be corrected by increasing the sample size. However, increasing the size of clinical trials will reduce their feasibility and increase their duration. Aside from these issues, including more patients means exposing more patients to an experimental treatment and may challenge the equipoise concept. The issue of fixing a threshold defining significance refers to the Fisher–Pearson controversy. Estimating a P value is needed to quantify the strength of evidence. However, fixing a threshold is needed to make a decision controlling for the risk of type I and type II error. Actually, regarding the issue addressed by Johnson, it would be interesting to assess if a priori specification of the threshold is required, or if research results could be compared using the P value and the magnitude of the tested statistic. The issue of significance level is only the tip of the iceberg. Indeed, design issues should not be overlooked when discussing lack of reproducibility. Selection bias leads to extrapolation of results to a population different from the target population (3). Furthermore, the “poor reporting” practice highlighted by Altman et al. (4) and the lack of compliance to reporting recommendations (e.g., Consolidated Standards of Reporting Trials) hinder a proper assessment of the quality of the study and hide selection bias or misuse of statistical tests; the latter leads to nonreproducibility of the reported research. In an extreme example, monthly American Air passengers and the Australian electricity production in the late 1950s are highly correlated (Pearson’s correlation = 0.88, P = 8.8 × 10) without any meaning. The causality criteria defined by Hill (5) highlight other important considerations in the interpretation of results. Reliance on P values remains surprisingly widespread, but good decision making depends on the magnitude of effects, the plausibility of scientific explanations of the mechanism, and the reproducibility of the findings by others.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Brief Philosophical Encounter with Science and Medicine

We show a lot of respect for science today. To back up our claims, we tend to appeal to scientific methods. It seems that we all agree that these methods are effective for gaining the truth. We can ask why science has its special status as a supplier of knowledge about our external world and our bodies. Of course, one should not always trust what scientists say. Nonetheless, epistemological jus...

متن کامل

The reproducibility of research and the misinterpretation of p-values

We wish to answer this question: If you observe a 'significant' p-value after doing a single unbiased experiment, what is the probability that your result is a false positive? The weak evidence provided by p-values between 0.01 and 0.05 is explored by exact calculations of false positive risks. When you observe p = 0.05, the odds in favour of there being a real effect (given by the likelihood r...

متن کامل

Exploring the Meaning of Quality from Urban Space Users’ Viewpoint by Analyzing Conceptual Environment Codes

The main purpose of urban design is to create good and high-quality urban spaces and environments for people to live while such quality may not be determined only by imposing a structural, perceptual and value system of the designer. It can be said that human and his powers to perceive surrounding environments are the focus of urban design. Having reviewed previous researches and theories in re...

متن کامل

تبیینی از چند مسأله فلسفی در علم اقتصاد

The main purpose of this paper is to find an answer to the key question: “What kind of philosophical issues could be expected to be dealt with in economics?” Having reviewed the related literature, the researchers try to explain the philosophy of economics in the context of three domains, i.e. ontology, epistemology, and methodology of economics. The results suggest that ontological issues in e...

متن کامل

Design of Persian Karbandi: The Problem of Dividing the Base from a Mathematical Viewpoint

Karbandi is the structure of a kind of roofing in Persian architecture. One of the main issues related to the design of karbandi is that, due to its geometrical structure, it is not possible to design any desired karbandi on a given base. Therefore, it is necessary for the designer to be able to discern the proper karbandi for a given base. The most critical stage in designing a karbandi is whe...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Proceedings of the National Academy of Sciences of the United States of America

دوره 111 19  شماره 

صفحات  -

تاریخ انتشار 2014